# Code-only replication repository for Wong et al 2025 BJPS

This set of files includes all of the code necessary to reproduce all of the
results in the paper. It contains no data. The paper itself involves using
Canadian census Dissemination Area (DA) data combined with Census Subdivision
Data (CSD) data in nearly every step of the analysis. This data could be used
to identify survey respondents, and so we do not include it here. If a
researcher would like access to this data, we encourage them to contact Cara
Wong <carawong@illinois.edu>. We are in the process of submitting these files
to the ICPSR which has a well established procedure for the use of restricted
access data, and, in the future we will direct people to the ICPSR for this
purpose.

We recommend that you follow a series of steps in order to reproduce the
results in the paper. Feel free to leave an
[Issue](https://github.com/bowers-illinois-edu/wong_bjps_2025_no_data/issues) on this
Github repository with a description of your problem and one of us will work on
helping you out.

Note: We have only used this workflow on a Mac OS computer and a Linux
computer. We have not tested it on a Windows computer. We would be happy to
publish modifications to enable broader interoperability. Feel free to contact us.


## Get an API for use with the `cancensus` R package to download Canadian Census Data

We use Canadian Census data throughout this paper and we access it using the
`cancensus` R package. That package requires that researchers get an API number
or code to identify them when they access the Census Mapper servers. You can
put this API number into the `Data/get_and_setup_2006_census_data.R` and
`Data/get_and_setup_2016_census_data.R` files in order to run that part of the
workflow. These data are public. So, no permission is needed to access them.

The public Canadian census data files that we distribute here and that can be
re-created using the code as documented in the following lines in
`Makefile.datasetup` are the following:

```makefile
CENSUS_DATA_FILES = Data/CensusData/2006_Data/census_data_csd_06.rda \
          Data/CensusData/2006_Data/census_data_06.rda

$(CENSUS_DATA_FILES): Data/get_and_setup_2006_census_data.R
 $(RCMD) Data/get_and_setup_2006_census_data.R

Data/CensusData/2016_Data/census_data_16.rda: Data/get_and_setup_2016_census_data.R
 $(RCMD) Data/get_and_setup_2016_census_data.R
```

As you can see in the R scripts, you will need to install the following
packages: here, sf, cancensus, and dplyr.


## Install the Gurobi Optimization System

The `designmatch` package relies on the `gurobi` constrained optimization
software. To replicate the code in this package, you'll need to install
`gurobi` by hand from the [Gurobi Quickstart
Page](https://www.gurobi.com/documentation/quickstart.html). This will involve
getting a free academic license, downloading the Gurobi software for your
system, installing it on your system, and then installing the gurobi R package
into the R installation within this project directory. `wong_bjps_2025.Rproj` R project directory.

For example, on a Mac we first had to register an account with Gurobi using our
academic email address and to activate an academic license for a single-user,
then we downloaded a file called `gurobi12.0.3_macos_universal2.pkg`. Then we
double clicked that file using the Mac Finder to install the Gurobi software on
our system. 

After that installation, we started R within the `wong_bjps_2025` directory
(for example, we started RStudio using the `wong_bjps_2025.Rproj` file.) and
then we typed the following from the R console 

```r
install.packages("/Library/gurobi1203/macos_universal2/R/gurobi_12.0-3_R_4.5.0.tgz",repos=NULL)
```

Before running any of our files that rely on the Gurobi optimizer, we had to
activate the gurobi license using a command like `grbgetkey 1234...` at the
unix command line on our Mac computers (using the Terminal application).

**As an alternative to gurobi** You could also try the open-source `highs`
package (installing it from within R using `install.packages("highs")`,
changing the arguments to the `solver` lists that we provide to the `nmatch()`
functions.  We haven't tried that solver in this paper so we expect that some
of the results would differ.

## Install the R Packages Used in the Paper

This paper uses many R packages and we kept our collaboration running smoothly
by using the [renv](https://rstudio.github.io/renv/articles/renv.html) system
to keep track of packages and their dependencies. You should be able to install
all of the packages (except for the gurobi package that has to be installed by
hand first) using `renv::restore()`) from the R command line from an R session
started within the root of this project directory (perhaps after starting RStudio using the
`wong_bjps_2025.Rproj` file, or in some other way) and following the
instructions.

```r
## If you haven't already installed renv do so using install.packages('renv')
renv::restore()
```

You may have to work a bit with the `renv` system to avoid errors. For example,
the first time you type `renv::restore()` it may complain that your version of
gurobi is not the same as the one we used originally (12.0-2). So you may need
to reinstall the one you want (we had to issue this command first before we
used `renv` on this replication archive and then again after seeing some
warnings from `renv`)

```r
install.packages("/Library/gurobi1203/macos_universal2/R/gurobi_12.0-3_R_4.5.0.tgz",repos=NULL)
```

And then you may need to see if there are other packages that need reinstalling
by typing `renv::status()` and `renv::snapshot()`.

For example you might see:

```r
> renv::status()
The following package(s) were installed from an unknown source:
- gurobi [12.0-3]
renv may be unable to restore these packages in the future.
Consider reinstalling these packages from a known source (e.g. CRAN).
```

But, at the end, when you quit R and restart it, you should see an R console
saying something like this which indicates that all relevant packages are
installed and you are ready to try to build the paper.

```r
R version 4.5.0 (2025-04-11) -- "How About a Twenty-Six"
Copyright (C) 2025 The R Foundation for Statistical Computing
Platform: aarch64-apple-darwin20

...

- Project '~/Documents/PROJECTS/wong_bjps_2025' loaded. [renv 1.1.5]
> 
```


## Build the Manuscript using the GNU Make system

We use the [make](https://www.gnu.org/software/make/) system to keep track of
the dependencies among the files in this project. This means that, if you are
using a Mac or Linux machine you should be able to open a Terminal window, `cd`
to the directory containing the replication files, and then to type `make
Manuscript/manuscript.pdf`.

If you are using RStudio we recommend that you open the `wong_bjps_2025.Rproj`
file --- this is an R Project file that will make sure that you are in the
correct working directory. It also allows you to use the `make` system by going
to the "Build" tab and clicking on "Build All".

You can see whether your system is ready to use `make` by typing `which make`
in the Terminal. If you do not see a path to the `make` command (like
`/usr/bin/make`), then you will need to install GNU Make.

**To install GNU Make on a Mac** you need to install the "command line tools"
via the following command at the unix command line
within the Mac terminal `xcode-select --install`.

Whether you type `make Manuscript/manuscript.pdf` at the command line or click
"Build All" from the RStudio "Build" menu or run each of the files below in
order, this will take some time since (1) the nonbipartite matching problems
take time to solve and (2) we present Bayesian multilevel model results which
require MCMC sampling.

You can see all of the steps required to produce the `manuscript.pdf` file by
typing `make -n Manuscript/manuscript.pdf`, where you should get output like
this (maybe with some files repeated) showing which files should be run and in
which order:

```bash
## If you need the canadian census data, you might need to download it first
R --no-save --no-restore -f Data/make_working_files.R
R --no-save --no-restore -f Design/dist_mats_data_anyDA_new.R
R --no-save --no-restore -f Design/match_anyDA_new.R
R --no-save --no-restore -f Analysis/analysis_anyDA_new.R
R --no-save --no-restore -f Analysis/supp_desc_new.R > Analysis/supp_desc_new.Rout
R --no-save --no-restore -f Figures_Tables/figures_anyDA_new.R
R --no-save --no-restore -f Analysis/alt_explanations_analysis.R
R --no-save --no-restore -f Figures_Tables/alt_explanations_plot.R
R --no-save --no-restore -f Design/dist_mats_data_DA_new.R
R --no-save --no-restore -f Design/match_DA_new.R
R --no-save --no-restore -f Analysis/analysis_DA_new.R
R --no-save --no-restore -f Figures_Tables/figures_DA_new.R
R --no-save --no-restore -f Data/sameDAdat.R
R --no-save --no-restore -f Analysis/analysis_sameDA.R
R --no-save --no-restore -f Figures_Tables/figures_sameDA.R
R --no-save --no-restore -f Analysis/sameDAviewDA.R
R --no-save --no-restore -f Figures_Tables/figures_sameDAviewDA.R
R --no-save --no-restore -f Design/dist_mats_data_anyDA_Diversity_new.R
R --no-save --no-restore -f Design/match_anyDA_Diversity_new.R
R --no-save --no-restore -f Analysis/analysis_anyDA_Diversity_new.R
R --no-save --no-restore -f Figures_Tables/figures_anyDA_Diversity_new.R
R --no-save --no-restore -f Figures_Tables/plot_pairwise_social_cohesion_anyDA_new.R
R --no-save --no-restore -f Design/match_assess_anyDA_new.R > Design/match_assess_anyDA_new.Rout
R --no-save --no-restore -f Figures_Tables/coefplot_table_anyDA.R
cd Manuscript && latexmk -pdf manuscript.tex
```

You can see the relationships between the files as laid out in the different `Makefile`s in this graphic:

<img width="3338" height="1052" alt="workflow" src="https://github.com/user-attachments/assets/efc9c5a2-36bd-4b71-b5f0-3102beb93ac4" />


We have not tested our workflow on a Windows computer and we would love
advice/pull requests about how to use our Makefile in that context.
